Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pick a non-expunged clone source #7283

Merged
merged 5 commits into from
Dec 19, 2024

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Dec 19, 2024

When performing region snapshot replacement, the associated start saga chose the request's region snapshot as the clone source, but if that region snapshot was backed by an expunged dataset then it may be gone.

This commit adds logic to choose another clone source, either another region snapshot from the same snapshot, or one of the read-only regions for that snapshot.

Basic sanity tests were added for ensuring that region replacements and region snapshot replacements resulting from expungement can occur. It was an oversight not to originally include these! Rn order to support these new sanity tests, the simulated pantry has to fake activating volumes in the background. This commit also refactors the simulated Pantry to have one Mutex around an "inner" struct instead of many Mutexes.

Fixes #7209

When performing region snapshot replacement, the associated start saga
chose the request's region snapshot as the clone source, but if that
region snapshot was backed by an expunged dataset then it may be gone.

This commit adds logic to choose another clone source, either another
region snapshot from the same snapshot, or one of the read-only regions
for that snapshot.

Basic sanity tests were added for ensuring that region replacements and
region snapshot replacements resulting from expungement can occur. It
was an oversight not to originally include these! Rn order to support
these new sanity tests, the simulated pantry has to fake activating
volumes in the background. This commit also refactors the simulated
Pantry to have one Mutex around an "inner" struct instead of many
Mutexes.

Fixes oxidecomputer#7209
@jmpesp jmpesp requested a review from leftwo December 19, 2024 15:36
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions to be sure I'm following along here.

nexus/db-queries/src/db/datastore/region.rs Show resolved Hide resolved
dataset_id: params.request.old_dataset_id.into(),
region_id: params.request.old_region_id,
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I'm following correclty, we land here with an expected source (request.old_dataset_id, request.old_region_id`) But we need to verify that it's not expunged yet, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, though this function changed a bit as a result of our call :)

);

// Select another region snapshot that's part of this snapshot - they
// will all have identical contents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand where we are correctly here:

We have a snapshot with a failed downstairs we want to replace.
We have chosen another downstairs target to clone from, but we then discovered that source was on an expunged disk?
So now we are going to try and get that third (final) downstairs as a source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly yeah!

request: if it's flaky, it shouldn't be used as a clone source!
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

let osagactx = sagactx.user_data();
let log = osagactx.log();

// If the downstairs we're cloning from is on an expunged dataset, the clone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just update this comment to what we discussed over chat

@jmpesp jmpesp enabled auto-merge (squash) December 19, 2024 21:35
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still ship it!

@jmpesp jmpesp merged commit 460f038 into oxidecomputer:main Dec 19, 2024
16 checks passed
@jmpesp jmpesp deleted the pick_non_expunged_clone_source branch December 19, 2024 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read-only region clone attempting to contact expunged downstairs
2 participants